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ABSTRACT 

The large-angle, low multipole cosmic microwave background (CMB) provides a unique view 
of the largest angular scales in the Universe. Study of these scales is hampered by the facts 
that we have only one Universe to observe, only a few independent samples of the underlying 
statistical distribution of these modes, and an incomplete sky to observe due to the interpos- 
ing Galaxy. Techniques for reconstructing a full sky from partial sky data are well known 
and have been applied to the large angular scales. In this work we critically study the re- 
construction process and show that, in practise, the reconstruction is biased due to leakage of 
information from the region obscured by foregrounds to the region used for the reconstruction. 
We conclude that, despite being suboptimal in a technical sense, using the unobscured region 
without reconstructing is the most robust measure of the true CMB sky. We also show that for 
noise free data reconstructing using the usual optimal, unbiased estimator may be employed 
without smoothing thus avoiding the leakage problem. Unfortunately, directly applying this to 
real data with noise and residual, unmasked foregrounds yields highly biased reconstructions 
requiring further care to apply this method successfully to real- world CMB. 

Key words: cosmology: cosmic microwave background - cosmology: large-scale structure 
of Universe 



1 INTRODUCTION 

Several prominent anomalies in the large-angle, low-f cosmic 
microwave background (CMB) have been identified, starting 
with pioneering observations by the Cosmic Background Ex- 
plorer (COBE) ( [Bennett et al.|1996) l, and confirmed and extended 
with the high precision observations from the Wilkinson Microwave 
Anisotropy Probe {WMAP) jBennett et al. 2003 1. These anomalies 
include the unexpectedly low correlations at scales above 60 de- 
grees ( |Bennett et allfT^ [2003] |Copi et al.|[20T0) [Sarkar et al.| 
|201 l l,the alignments of the largest multipoles with each other and 
the Solar System (de Oliveira-Costa et al.||2004| |Sch warz et al.| 
|2004[ |Land"^ Magueijo 200 5[|Copi et a l. 2006), a parity asym- 
metry at low multipoles l |Kim & Naselsky|,2010a|b|d)C^ , and the 
spatial asymmetries in the distribution of power observed at smaller 
scales ( [Eriksen et al.|2004a|b[ [Hansen et al.|2009| l. Numerous at- 
tempts have been made to explain or explain away these anomalies 



i Slosar & Seljak|2004[[Hajian|2007[[Afshordi et al.|2009[[Bennett[ 



[et al.|201l " - none of them successful (see |Copi et al.|2010[ and 
references therein, for a review). 

The most peculiar and robust CMB anomaly is arguably 



the lack of correlation on large angular scales first observed by 
COBE ( [Bennett et al.|1996j l and confirmed and further quantified 
through the S-1/2 statistic by WMAP ( [Spergel et al.|2003^ . Subse- 
quent study of the two point angular correlation function, C{6), has 
found further oddities; the large angle correlation is mainly miss- 
ing outside of the Galactic region, there being essentially no corre- 
lation on large angles. The large-angle correlation that is observed 
comes from the foreground removed Galactic region of the recon- 
structed full-sky map (Copi et al. 2009). From the internal linear 
combination (ILC) mapPjthe full-sky map created from the indi- 
vidual frequency bands which provides our best picture of the full 
sky microwave background radiation, it is found that the lack of 
correlation is unlikely at the approximately 95 per cent level. How- 
ever, when solely the region outside the Galaxy of the individual 
frequency or ILC maps are analysed the lack of correlation is rare 
at the approximately 99.975 percent level ( [Copi et al.|2009^ . 

The study of the large-angle CMB presents special problems 
that must be treated carefully. Since there is only one Universe to 
observe and few independent modes at low-^, large sky coverage is 
needed, and even with this coverage, very little independent infor- 
mation about the ensemble is available. Further, given the observed 



low quadrupole power, C2 
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best fit ACDM model, C^^°^ ~ 1300 (^K)^ large-angle studies 
are particularly sensitive to assumptions and unintended biases. 

One suggestive example of this is provided by the ILC map 
itself. If we use a pixel based estimator for the Ci as imple- 
mented in SpICE ( |Chon et al.|2004l > we can easily determine the 
quadrupole power inside and outside the WMAP provided analysis 
mask KQ75y7 to be 



610 (^K)^ 



: 80 {[iKy 



(1) 



The KQ75y7 mask cuts out approximately 25 percent of the sky. 
Taking the weighted average of these values produces the intriguing 
result 



0.25C^"=''''= 



+ 0.75Cr*"''" 



200 (^K)^ 



(2) 



a value consistent with the WMAP reported C2 ( [Larson et al.|201 1} . 
Again we stress this is a suggestive example, not a careful analysis; 
the pseudo-Cf (PCL) estimator employed here is suboptimal, we 
have not include errors on the estimates, etc. It does, however, show 
the wide discrepancy between the Galactic region and the rest of the 
sky, a common theme for the ILC map. Further it shows how a large 
value mixed in from a small region of the sky significantly impacts 
the final result. 

In a recent paper [Efstathiou et al.| ( |20I0l >, the authors claimed 
that the low 5*1/2 results are due to the use of a suboptimal estima- 
tor (the pixel based estimator) of C{9) and proposed an alternative 
based on reconstructing the full sky. This proposal avoids address- 
ing the question of why the partial sky contains essentially no cor- 
relations on large angular scales and instead focuses on a new ques- 
tion that centre on the issue of how the full sky is reconstructed. In 
this work we carefully study full-sky reconstruction algorithms and 
their effects on the low-^ CMB. 

It is well known that contamination affects the reconstruction 
of the low multi poles jBiel ewicz et al.'2004','Naselsky et al.'2008' 
Liu & Li 2009| [Aurich & Lustig„2010|l. In particular ,Aurich &| 



Lustig 1 2010 1 have found that smoothing of full sky map prior to 



analysis, as required by a reconstruction algorithm (see |Efstathiou| 
|etal.| ( |20T0) and our discussion below) leaks information from from 
the region inside the mask to pixels outside the mask. They showed 
that the pixels outside the mask have errors that are a significant 
fraction of the mean CMB temperature. They further find that it is 
safest to calculate the two point angular correlation function on the 
cut-sky. Here we confirm and extend these results. 

Alternative analyses such as that suggested in |Efstathiou et al.| 
( |2010) , must be performed with care. In this work we carefully 
study the full-sky reconstruction, based on the cut-sky data, in a 
Universe with low quadrupole power. In Sec.|2]we briefly present 
the formalism typically employed in CMB studies. Sec.|3]contains 
our results and we conclude in Sec.|4] Ultimately we find that if a 
full-sky map, such as the ILC, is a faithful representation of the true 
CMB sky, then a reconstruction algorithm can reproduce its prop- 
erties. This is not suiprising: if the full-sky map is already trusted, 
there is no need to perform a reconstruction and nothing is gained 
by doing so. However, if part of the full sky is not trusted or is 
known to be contaminated, then, by reconstructing without prop- 
erly accounting for the assumptions implicit in the algorithm, the 
final results will be biased toward the full-sky values. Again this is 
not surprising, if information from the questioned region is allowed 
to leak into the rest of the map then it will affect the final results 
and nothing will be learned about the validity of the reconstruction. 

In any reconstruction of unknown values from the properties 
of existing data assumptions must be made. Often these assump- 
tions are not explicitly stated. For the work presented here we take 



the observed microwave sky outside of the Galactic region as de- 
fined through the KQ75y7 mask to be a fair sample of the CMB. 
This partial-sky region is known to have essentially no correlations 
on large angular scales; it is unlikely in the best fit ACDM model 
at the 99.975 per cent level l |Copi et al.|2009) . Our study shows 
the bias introduced into full-sky reconstructions when an admix- 
ture of a region with larger angular correlations is included prior to 
reconstruction. We stress that results of the partial-sky analysis are 
not being questioned, instead a new question is being asked; how 
should the full sky be reconstructed when there is a wide disparity 
between the statistical properties of the region outside the Galaxy 
and that inside. 



2 RECONSTRUCTION FORMALISM 

Optimal, unbiased estimators for both the Ci and airn are well 
known and discussed extensively in the literature (see, for exam- 
ple, [Tegmark 1997, Efstathiou 2 004[|de Oliveira-Costa & Tegmark| 
iMOe, Efstathiou et al. 2010). Here we provide a brief overview of 
the maximum likelihood estimator (MLE) technique and introduce 
our notation. For details including discussions of invertability of 
the matrices, proofs of optimality, etc., see the references. 

The microwave temperature fluctuations on the sky can be rep- 
resented by the vector x{ej), 



X = Ya + n, 



(3) 



where Y is the matrix of the Ye_rn{ej), j runs over all pixels on the 
sky, i-j is the radial unit vector in the direction of pixel j, a is the 
vector of aim coefficients, and n is the noise in each pixel. For the 
work considered here we are only interested in the large-angle, low- 
£ behaviour so we assume that n can be ignored and set n = in 
what follows. When working with the WMAP data at low resolution 
this is justified, for example the W band maps at NsiDE = 16 have 
pixel noise o-pix < 3 ii.K. At higher resolution this is not as clearly 
justified. In this work we study reconstruction bias independent of 
pixel noise so we may ignore n for our simulations. When setting 
n = are further assuming that the region we are analysing is 
free of foregrounds. This is a standard, though implicit, assumption 
when reconstructions are performed. The covariance matrix is then 
given by 



C = {xx~^) = S. 



(4) 



Here the angle brackets, (■) represent an ensemble average. This is 
the expectation value of the theoretical two point angular correla- 
tion function, not its measured value. As is customary, we call S 
the signal matrix despite the fact that it is not the two point angular 
correlation measured on the sky. We do not include a noise matrix, 
N, in our covariance since we are neglecting noise. 



2.1 Reconstructing the airn 

To reconstruct the airn we define the signal matrix as the two point 
angular correlation function of the unreconstructed modes 



c = s 



=«rcco„ + l 



Here is the matrix of the weighted Legendre polynomials. 



, _ 21+1 . 



(5) 



(6) 
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and we assume all modes with 2 ^ £ ^ Brecon are to be recon- 
structed. Here ^max is the maximum multipole considered. We have 
chosen ^max = 4NSIDE + 2 for this work. The optimal, unbiased 
estimator is then given by l |de Oliveira-Costa & Tegmark|2006^ 



(7) 



Note that here and throughout we work in the real spherical har- 
monic basis, so Y is a real matrix. The covariance matrix of our 
estimator is 



E = (aa^) - {&){&)'' = [Y'^C^'Y]"' 



(8) 



The signal matrix, C, need not include all pairs of pixels 
on the sky. When it does, a reconstruction will produce precisely 
the spherical harmonic decomposition. Conversely, when a sky is 
masked, we only include the unmasked pixels in C. The process of 
'masking' is thus performed by removing the masked pixels from 
the signal matrix, and this process is equivalent to assigning infinite 
noise to the masked pixels. 



2.2 Reconstructing the 

To reconstruct the Ce we define the signal matrix as the two point 
angular correlation of all the modes; 



C = S = ^ C,P^ 



(9) 



Notice that this differs from our previous definition |5]l. The op- 
timal, unbiased estimator for the Ce is then constructed from an 
unnormalized estimator, y^. Let 



Vi = x Ex, 



2 



= ^C^P'C^. (10) 



The correlation matrix of this estimator is the Fisher matrix, 

F,,r = {y,y/) - {y,){y,,)^ = \ Tr[C-^P^C-^P^'] . (ii) 

Finally, this gives the optimal, unbiased estimator of the Ci, 

Ci = J2^l}'yi'- (12) 

Though the full Fisher matrix can be calculated, it turns out to 
be nearly diagonal for reasonably small masks such as the WMAP 
KQ75y7 mask. In this case the approximations 



re pi Op pi 

2Cj • 



c-i ^ 2(71 ^ 

'''' ~ 2^ + 1 



1,1' 



(13) 



may be employed. We have confirmed the validity of this approx- 
imation and have employed it when applicable in our subsequent 
analyses. 



2.3 Relating the Estimators 

The optimal, unbiased estimators for aim and Ci are related to each 
other. If we define the weighted harmonic coefficients by 

/3 = S-'d, (14) 

then 

1 , 



yi = 2 2^1''^'^™ 



(15) 



is identical to ( 10 > from which we may calculate Ci ( de Oliveira 



[Costa & Tegmark 2006, Efstathiou et al. 2010). 

In our discussion we have been careful to note that C is defined 
differently when used as an estimator for the a^rn versus the Ci. In 
practise when the signal-to-noise is large the estimator for the aim 
is not sensitive to the precise values and range of the Ce employed. 
However, to find Ce from a through the weighted harmonic coeffi- 
cients the full signal matrix (j9]l must be used when calculating 
the covariance matrix ^ and Fisher matrix |TTJ. 

The above discussion shows that Eq. l |12[ ( is the optimal, unbi- 
ased estimator for the Ce- Even so, given a from l|7j it is tempting 
to define a naive estimator for the Ce via 



Cf 



2£ + 



(16) 



and use this to reconstruct C (9) (see fig. 5 of Efstathiou et al. 2010 1. 
In general this is a poor definition for the estimator as clearly an 
optimal, unbiased estimator for some quantity does not provide an 
optimal, unbiased estimator for the square of that quantity. Its use 
leads to a biased estimator for the Ce and a biased reconstruction of 
C{9). We will explore both this estimator and the optimal, unbiased 
one below. 



2.4 Two Point Angular Correlation Function 

The two point angular correlation function is defined as a sky av- 
erage, that is by a sum over all pixels on the sky separated by the 

Si ■ Bi, 



angle cos 6ij 



(17) 



Ideally the two point angular correlation function would also con- 
tain an ensemble average over realisations of the underlying model. 
Since we only have one Universe, this ensemble average cannot be 
calculated. However, for a statistically isotropic Universe the sky 
average and ensemble average are equivalent. This definition has 
the additional benefit that it can be calculated on a fraction of the 
sky. 

Alternatively the two point angular correlation function may 
be expanded in a Legendre series. 



(18) 



Note that for partial sky coverage or lack of statistical isotropy the 
Ce in this this expression are not the same as the Ce obtained from 
the aem', see |Copi et al.| ( (2007] l for a discussion. This subtlety will 
not be important for the following work. 



2.5 Si/2 Statistic 

To quantify the lack of large-angle correlations the 51/2 statistic 
has been defined by |Spergel et al.|j2003^ to be 



5i/2= / [C{e)]'d(cos9). 

^1/2 



(19) 



Expanding C(9) in terms of the Ce as above 1 18 i we find 

iS'i/2 ~ Cele,e' Cf , (20) 



where 
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(2^+l)(2^' + l) 
(47r)2 



(cos e)Pi,, (cos e')d(cos 6l)(21) 



1/2 



is a known matrix (see |Copi et al.|20l0l l that can be evaluated. 
The estimator generally employed for S1/2 is 

5*1/2 = ^ ^ CiXi^iiCii . 



(22) 



Even with Ci itself an optimal, unbiased estimator of Ct, this does 



not produce an optimal, unbiased estimator for 5*1/2 (Pontzen & 
|Peiris 2OIO 1. For the unbiased estimator \i2\ we have 

i,e' 



there is no need to perform either masking or the reconstruction at 
all, the full-sky map can be analysed directly. Therefore, validity of 
the stringent assumptions required for the reconstruction obviates 
the very need for the reconstruction. 

When the reconstruction formalism described above is applied 
to actual data, further assumptions are implicit. In our develop- 
ment we have assumed that the temperature fluctuations contain 
pure CMB signal. In practise, besides pixel noise (which we have 
not included as described above) the data may contain unknown 
foregrounds. To avoid contamination by foregrounds it is common 
to analyse a foreground-cleaned map, such as the ILC map, and 
to mask the most contaminated regions of the sky. In following 
this approach, care must be taken not to reintroduce contamina- 
tion in the data prior to reconstruction. As we will show below, the 
standard process of preparing data for reconstruction, in particular 
smoothing the full-sky map, violates this requirement. 




-CfSf 



CpCpi + — ^ffJf P< 

2£ + l ' 



(23) 



In the second line we have used the definition of the Fisher ma- 
trix l |l l| l, the third line is an alg ebraic simplification, and in the 
final line we have again used 1 12 1, the fact that Ce is unbiased, and 



the approximation from ^13\ . With this it now straightforward to 
see that 



t,e' 
/ 'S'1/2. 



'S'1/2 + 



E 



2C/ , 
21+1^ 



(24) 



It is thus clear that {22\ is a biased estimator and, in fact, is biased 
toward larger values of 5i/2. 

As noted by [Pontzen & Peiris| (2010) this is of 'pedagogical 
interest' but does not affect the studies of low S1/2. The Monte 
Carlo simulations employed (see |Copi et al.|[2009[ for example) 
account for this bias. It does suggest that an alternative measure of 
the lack of large-angle correlations is desirable. 



2.6 Assumptions 



[Efstathiou et al.| j2010[ l claim that the full-sky, large-angle CMB 
can be reconstructed solely from the harmonic structure of the 
CMB outside the masked. Galactic region, and independent of the 
contents of the masked portion of the sky. We will demonstrate in 
what follows that this claim does not hold up to closer scrutiny. 

It is clear that without assumptions regarding the harmonic 
structure inside the masked region nothing can be said about it. 
In principle the low-l harmonic structure inside the masked region 
could be anything, ranging from no power, to large power, to wild 
oscillations, making the full-sky reconstruction impossible. 

Assuming a cosmological origin for the observed microwave 
signal outside the masked region, it seems reasonable to assume it 
will be consistent with the signal inside the masked region. With 
that assumption, the harmonic structure outside the masked region 
can be extended into the masked region. For actual, full-sky maps 
there is a further assumption: the region inside the mask is well 
enough determined and statistically close enough to the region out- 
side the mask that it does not bias the reconstruction. This latter 
assumption turns out to not be true as we demonstrate below. 

Note also that if the region inside the mask is trusted, then 



3 RESULTS 

To explore how data handling prior to reconstruction affects the re- 
sults, we have performed a series of Monte Carlo simulations of 
ACDM based on reconstruction procedures suggested in the liter- 
ature. We have employed the simplest best-fitting ACDM model 
from WMAP based solely on the WMAP data. This is model 
"Icdm-l-sz-l-lens" with "wmap7" data from the lambda site. Our 
results are insensitive to the exact details of the model since we 
are performing a theoretical study examine relative differences be- 
tween reconstructions and not performing parameter estimation. 
Our simulations are performed at NsiDE = 128 unless otherwise 
noted and we will focus on the reconstruction of a^m and C2 ■ Fur- 
ther, our simulations only consider Brecon = 10, reconstruct from 
the pixels outside the KQ75y7 mask provided by WMAP and de- 
graded to the appropriate resolution, and use the data from the 
WMAP seven year release. 

A collection of realisations of the full sky are created as fol- 
lows: 

(i) Generate a random sky at NsiDE = 512 from the best-fitting 
ACDM model. 

(ii) Extract the a2m and calculate the power in the quadrupole, 
denote this value by C2. 

(iii) Rescale the a2m so that the C2 in the map has a fixed value, 
for example, rescale so that C2 = 100 {\iK)^ by replacing the 
a2m with a2m ^ a,2m \/W0 (|J.K)2/C2. Notice that this does not 
change the phase structure of the a2m- 

(iv) Smooth the map with a 10° Gaussian beam, if desired. 

(v) Degrade the map to the desired resolution (NsiDE = 128 or 
NSIDE = 16). 

(vi) Repeat the rescaling of the a2m for each value of C2 
that we wish to consider. In our simulations we consider C2 ~ 
lO-lO** {[iK)^. This ensures that the same map realisation is used 
with only the quadrupole power changed. 

This procedure constitutes a single realisation. The results in this 
work are based on at least 20, 000 realisations. 

Degrading masks requires an extra processing step. Pixels near 
mask boundaries turn from the usual 1 or to denote inclusion 
or exclusion from the analysis, respectively, to fractional values. 
We redefine our degraded masks by setting all pixels with a value 
greater than 0.7 to 1 and all others to 0. For the KQ75y7 mask this 
process leaves about 70 per cent of the pixels for analysis. To be 
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Figure 1. The 95 and 5 percentile lines for the a2m reconstructed from the 
pixels outside the KQ75y7 mask at NSIDE = 128 (and thus ^max = 514) 
of ACDM realisations with C2 = 100 (n-K)^, as discussed in the text. The 
red, solid lines are for the real part of the a2m and the blue, dashed lines 
are for the imaginary part. The black, solid line shows the expected result 
for a perfect reconstruction. We see that the reconstruction is unbiased, that 
is, it tracks the true value. 



precise, at NsiDE = 128 this leaves 136, 828 unmasked pixels and 
at NsiDE = 16 there are 2, 157 pixels left. 

A map with a modest angular resolution contains all the low- 
£ CMB information, so it may seem surprising that we employ 
NsiDE = 128 in our studies. Instead it is common in low-^ stud- 
ies to employ a map at NsiDE = 16, corresponding to pixels of 
approximately 3° in size (see [Efstathiou et al.|[2010[ for a recent 
example of this). The effects of the choice of resolution, the need 
for smoothing a map prior to analysis, and the leaking of informa- 
tion this causes will now be explored. 



3.1 Choice of Map Resolution 

The study of large-angle, low-£ properties of the CMB appears 
naively not to require high resolution maps. Maps degraded to the 
resolution corresponding to NsiDE = 16 are commonly employed 
( |de Oliveira-Costa & Tegmark|2006| [Efstathiou et al.|2010^ . When 
a map is degraded by averaging over pixels, high frequency noise 
is introduced as may be seen in Figs. [T] and |2] These figures show 
the reconstructed a2,n using the optimal, unbiased estimator from 
Eq. |7| for realisations with C2 = 100 (ia.K)^. The solid, red lines 
(dashed, blue lines) show the 5 and 95 percentile lines from our re- 
alisations for the reconstructed real (imaginary) parts of each a2m, 
using maps degraded to NsiDE = 128 (Fig.[T]( and NsiDE = 16 
(Fig. [2]( and pixels outside the KQ75y7 mask. As expected from 
an unbiased estimator the reconstructed values track the true values 
(Fig. [TJ. Further we see that the 021 are best determined and the 
020 and a22 have larger variances due to the mask which produces 
greater admixture of ambiguous modes for these cases. However, 
for NSIDE = 16 (Fig. ^ we see that the reconstruction does not 



Figure 2. The same as Fig.[T]now reconstructed from the pixels outside the 
KQ75y7 mask at NSIDE = 16 (and thus fmax = 66). Here we see that the 
reconstruction is biased. 



track the true values and is instead biased. This bias is due to the 
averaging done to degrade the maps and becomes more significant 
the more the map is degraded. From this we conclude that the cou- 
pling of the small-scale modes to the large-scale modes caused by 
using maps with resolution that is too coarse can be at least partly 
responsible for reconstruction bias. 



3.2 Smoothing the Map 

In practise raw degraded maps are not used for the reasons shown in 
the previous section, instead the maps are smoothed with a Gaus- 
sian beam with FWHM of at least the size of the pixels and then 
degraded. In this work we employ a smoothing scale of 10°, con- 
sistent with [Efstathiou et al.| ( |2010) . Smoothing the maps studied 
in the previous section prior to reconstructing the produces 
the results shown in Figs.|3]and|4] With smoothing we see that the 
aim estimator is unbiased for both resolutions, NsiDE — 128 and 
NsiDE = 16. Smoothing is thus an essential step when working 
with low resolution maps. 

In Figs. [3] and |4] we also see that the variance in the recon- 
structed values is resolution dependent with the smaller variance 
provided by the higher resolution maps. Again this is not surprising, 
and can be understood as follows. Our covariance matrix in Eq. |8]( 
does not include a noise term yet we have introduced noise by de- 
grading. Smoothing does a good job at reducing the noise to a level 
where the reconstruction is unbiased, however, there is still resid- 
ual noise that affects the covariance of the estimator. The higher 
the resolution the smaller this noise. The best results are obtained 
by working at the highest resolution that is feasible. For this reason 
we work at NsiDE — 128 in our simulations. See Appendix [A| for 
technical details. 
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Figure 3. The same as Fig.[T]now reconstructed from maps smoothed with 
a 10° Gaussian beam applied to the full sky map. As in Fig. ^ the recon- 
struction is unbiased. 



Figure 5. The same as Fig.[3]now with the masked region filled in with the 
ILC map prior to smoothing and rescaling. We clearly see the reconstructed 
a2m are not unbiased. The bias in reconstructing 020 and 022 is pailicularly 
apparent. This is due to the leakage of information from inside the masked 
region. 
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Figure 4. The same as Fig.|3]now reconstructed from maps with NSIDE = 
16. The reconstruction is now also unbiased. 



3.3 Reconstructing the atm 

We have now seen that the estimator in Eq. jTj is an optimal, unbi- 
ased estimator for the atm when we work at high resolution and/or 
smooth the maps prior to reconstruction (Figs.[T]|3][4l. Although 
this has only been shown for C2 ~ 100 (n-K) we have verified 
that this is true independent of the quadrupole power 



As noted above, the fact that we are smoothing the maps prior 
to masking imposes assumptions on the maps. For the realisations 
discussed above the assumptions are met; the region inside the 
mask is, statistically, identical to the region outside. However, for 
real data the Galaxy is a bright foreground that must be removed. 
The WMAP ILC procedure attempts to do this and produce a full- 
sky CMB map. Even so, masking is often performed to avoid rely- 
ing on the information inside this region since it may still be con- 
taminated by Galactic foregrounds. 

Unfortunately, when the map is smoothed information leaks 
out of the masked region and biases the reconstruction as shown 
in Figs. |5] and [6] For this analysis, for each synthetic map we 
filled the masked region with the corresponding portion (i.e. the 
masked region) taken from the ILC map. We then smoothed and 
degraded the resulting synthetic map. In these two figures we show 
the true and reconstructed values of the coefficients a2m', we also 
show the ILC map's a2,n for reference. We clearly see the bias in 
the reconstructed a2m and its correlation with the ILC values. If 
a^2m < o^2rn' thsti a^2m IS biased upwards, and vice versa. For ex- 
ample, the ILC 022 values are large and negative which leads to the 
reconstruction being skewed to agree better at large, negative val- 
ues than at large positive values. This trend continues for the other 
a2m and clearly shows that the smoothing has mixed information 
from the masked region. 

We can also recognise other details in the quality of the recon- 
struction that are specifically due to the orientation of the KQ75y7 
mask in Galactic coordinates. For example, we see that the vari- 
ance in the reconstructed real part of 022 is larger than that for the 
imaginary part of a22; the reason is that the real part of Y22 has an 
extremum in the centre of the Milky Way where the mask 'bulges' 
while the imaginary part has a node at this location. Therefore, 
more information relevant to the real part of 022 is missing than 
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Figure 6. The same as Fig.jslnow with C2 = 1000 (nK)2. 



for the imaginary part, and the former has a larger reconstruction 
error. Moreover, it is also the case that the I20 and I22 have ex- 
trema in the Galactic plane whereas Y21 has nodes. Due to this the 
variances of a2o and 022 are expected to be larger than that of 021, 
as our reconstruction plots show. 

Notice also that the reconstruction bias we find is not an ar- 
tifact of the sharp transition introduced in the process of filling 
the masked regions of simulated maps with the ILC contents. The 
smoothing procedure, for one, completely removes the sharp fea- 
ture in the map. Moreover, we have explicitly checked that the re- 
constructed aim are not biased when the cut is filled with contents 
of another statistically isotropic map. Therefore, the reconstruction 
bias seen in our plots is real, and is caused to the specific structure 
of the ILC map behind the Galactic plane which 'leaks' into the 
unmasked region. 

The question, then, is how to fill the masked region before 
smoothing. In principle anything could be used to fill the Galactic 
region, but then the information about this fill would leak outside 
the mask due to the smoothing. If the map were masked prior to 
smoothing then 'zero' would be leaked and bias the reconstruction. 
Alternatively, if the Galactic region were filled with Gaussian noise 
with root-mean-squared value consistent with the region outside 
the Galaxy then the estimator would be unbiased similar to the re- 
sults in Fig. [T| but this would rely on the assumption that the true 
CMB inside the mask has precisely the same statistical properties 
as the CMB in the region outside. Filling with the ILC values would 
make sense if we could be completely confident that the ILC re- 
construction of the region inside the mask is accurate. However, in 
the ILC the region inside the Galactic mask has different statistical 
properties than the region outside, particularly for the large-angle 
behaviour. This alone raises concerns that the ILC reconstruction is 
not entirely accurate. Further, if we knew how to properly treat the 
region inside the mask, either by accepting the ILC values or filling 
it with appropriate statistical values, there would be no need for a 
reconstruction as we would have a full sky map to analyse! 



True C2 (nK)^ 

Figure 7. The 95, 50, and 5 percentile lines of the reconstructed C2 (top 
to bottom, respectively) from our realisations. The maps have not been 
smoothed prior to reconstruction. The pixel based (blue, solid line) comes 
from SpICE; where as, the reconstructed (red, dashed line) is the estimator 
. We see that this estimator is clearly biased toward larger reconstructed 
values for small, true C2 , such as the values extracted from WMAP using 
either the PCL or MLE procedures. For a value of C2 near the ACDM value 
the reconstruction inethod is a good estimator The pixel based method pro- 
duces values of C2 with a median much closer to the true values, though 
with larger error bars. 

The challenge is that there is no, or at least no unique, com- 
pelling choice of how to fill the masked region before smoothing. 
In the face of this, the approach we take below is to study how the 
admixture of the large-angle behaviour of the Galactic region from 
the ILC map affects the reconstruction of the low-^ CMB, particu- 
larly when the region outside the Galaxy has low quadrupole power 
and lack of large-angle correlations. We show how this particular 
choice biases the reconstruction. 

3.4 Reconstructing the Ce 

Since we are interested in reconstructing C{9) we next need to re- 
construct the Ce. From the ai^n we first proceed using the naive 
estimator ( |16f , denoted C| (as used to generate fig. 5 of |Efstathiou| 
[etaLpOlO ). 

The results for this estimator are shown in Fig. [7] For these 
realisations the maps were not smoothed. The reconstruction is 
shown as the dashed, red lines representing the 5, 50, and 95 per- 
centile values as a function of the true C2 used to generate the 
maps. The solid, blue lines are the equivalent values from the re- 
construction based on the pixel estimator from SpICE. Again the 
solid, black line is the reconstructed=true relation plotted to guide 
the eye. At large C2 we see the desired behaviour: the reconstructed 
values from both estimators are centred around the true value, and 
C2 does have a smaller variance, as an optimal estimator should 
(however, this does not mean it is optimal). At low C2, in particu- 
lar near the WMAP PCL and MLE values, the pixel based estimator 
is still centred around the true value, though with large variance; 
however, the C| is now biased toward larger values. 

The results in Fig. |7] were for unsmoothed maps. The usual 
approach is to smooth the maps which suppresses power on 
small scales (high-^). Fig. [S] shows the results when the maps are 
smoothed prior to reconstruction; they are encouraging. Both es- 
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Figure 8. The same as Fig. ^ now with the realisations smoothed to 10° 
prior to reconstruction. It appears that the estimator does a better job 
of reproducing C2 for a ACDM model, though, see Fig.|9] 
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Figure 10. Similar to Fig. [s] now comparing C|, the harmonic coefficient 
estimator jl6[ again as the dashed, red lines to C2, the weighted harmonic 
coefficient estimator \\2) as the green, solid lines. We see that the weighted 
harmonic coefficient estimator is unbiased over the full true C2 range. 
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Figure 9. The same as Fig. [8] now with the region inside the masked re- 
placed by the ILC prior to smoothing to 10° and reconstructing. Clearly 
the ILC information from inside the masked region has leaked out biasing 
the reconstruction. Not surprisingly the reconstruction now is only accu- 
rate near the WMAP MLE value; the value consistent with this region of 
the ILC. At lower C2 values the reconstruction plateaus to this value as it 
is the main contribution to quadrupole power. At higher values of C2 the 
quadrupole power is suppressed by the leakage. 



timators now track the true values much more closely. Even the 
median of Cf remains close to the true value for values near the 
WMAP PCL value. This shows that with smoothing the correlations 
are reduced due to the lack of high frequency noise. It suggests that 
smoothing the map, reconstructing the aim, and employing C| as 
our estimator is sufficient and nearly optimal. 

Unfortunately this is not the case. As noted above, smooth- 
ing makes assumptions about the validity of the region inside the 
mask. We saw that even for the airn this leads to a bias (see Fig.[5]l. 
When the corresponding ILC portion is placed into the masked re- 
gion prior to smoothing the Cf is also biased as shown in Fig. |9] 
We see that the masked region drives Cf to be near the value in- 
side the mask (approximately the WMAP MLE value). The Cf re- 



sults are biased upward for very small C2 and downward for large 
C2. Thus, even though smoothing helps in removing the coiTelation 
bias in the C| estimator it introduces its own bias. How the masked 
region is filled determines how the distribution of — 6*2'^"° will 
be skewed. Roughly speaking the values inside the mask will be 
favoured, raising the reconstructed values that are lower than the 
masked region values, and lowering values that are higher than 
those from the masked region. 

We have seen that the naive estimator, C|, provides an un- 
biased estimate of C2 when the true value is near the expected, 
ACDM value. However, when the true value is low this estimator 
tends to overestimate C2. Further, when smoothing is applied the 
reconstruction skews the values towards those consistent with the 
region inside the mask. This is to be expected. In fact, if the region 
inside the mask were believed then there would be no need to re- 
construct at all, a full-sky map would already exist and it could be 
used for analysis without this extra effort. 



3.5 Optimal, Unbiased Ct Estimator 

The general behaviour found for the naive estimator, C^, carry over 
to the optimal, unbiased estimator, Ci, based on the weighted har- 
monic coefficients 



12 1, as we now see. Calculating Ci for the reali- 



sations considered above we find the results in Fig.[TO]for Gaussian 
smoothed maps. This figure should be compared to Fig. [8] We see 
that C2 is nearly unbiased over the full range of true C2 as ex- 
pected. 

The effect of smoothing when the ILC is inserted into the 
masked region is shown in Fig. [TT] Again we see the bias intro- 
duced by smoothing when the two regions do not contain the same 
structure. These results are qualitatively similar to those found in 
Fig.|9]and the same discussion applies. 



3.6 Reconstructing Without Smoothing 

The reconstruction of the a2m without smoothing showed that for 
NSIDE = 128 the reconstruction was unbiased (Fig. [TJ but for 
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Figure 11. The same as Fig.QOjnow for the Galactic region filled with the 
ILC values. We see that both estimators are now biased to agree best near 
the "WMAP MLE value as we saw in Fig. [9] 



Table 1. 5i/2 values for the ILC map calculated for 2^1^ 10. The map 
is unprocessed, Gaussian smoothed with a 10° beam, or had the Galactic 
region filled with a Gaussian random, statistically isotropic sky realisation 
with the same power spectrum as the region outside this region prior to 
smoothing. The values are calculated for the full sky and for the KQ75y7 
masked sky at NSIDE = 128 using a pixel based estimator or the optimal, 
unbiased Ci estimator (T2} from maps at NSIDE = 128 and NSIDE = 16. 
The last row refers to the map whose mask area has been filled with Gaus- 
sian random field whose power is consistent with power measured outside 
the mask. 
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NSIDE = 16 there was a resolution dependent bias (Fig.|2j. Calcu- 
lating the weighted harmonic coefficient estimator l |12[ l from these 
reahsations produces the results in Fig. [T2] At first glance these 
results are surprising and encouraging. The green, solid lines for 
NSIDE = 128 and red, dashed fines for NsiDE = 16 nearly over- 
lap and the central value very closely follows the true value. This 
is surprising since the a2m at NsiDE = 16 are biased and have 
smaller variance than the corresponding NsiDE = 128 (see Figs.[T| 
and[2](. Even so, when combined to determine C2 these differences 
average out and lead to nearly identical predictions. 

Based on Fig. [T2] we may think we have solved the recon- 
struction problem; just reconstruct using the optimal, unbiased Ci 
estimator l |12^ without smoothing! Unfortunately we cannot draw 
this conclusion from the results presented here. Recall that the re- 
constructions have been performed on noise-free, pure CMB maps. 
Real maps contain noise and potentially residual, unmasked fore- 
grounds. In particular uncorrected, diffuse foregrounds are known 
to contaminate the low-£ reconstruction (Nase lsky et al.[2 008). A 
careful study of the issues faced when applying the reconstruction 



Figure 12. The same as Fig. Qo] now comparing the weighted harmonic 
coefficient estimator for NSIDE = 128 as the green, solid lines and 
NSIDE = 16 as the red, dashed lines without smoothing the map prior to 
reconstruction. Since there is no smoothing, the results do not depend on the 
contents of the masked galactic region. We see that the reconstruction with- 
out smoothing is unbiased for most of the C'2 range, however see Sec. |3.6| 
for a discussion of its inapplicability to real data. 



to real data is beyond the scope of this work and will be reserved 
for future study. However, naive application of this method to real 
data yields highly biased reconstructions. 



3.7 51/2 Estimator 

The study of the 51/2 statistic is a large project in its own right 
and will not be pursued in detail here. Our Universe as encoded 
in the ILC map contains a somewhat small full-sky 51/2 and an 
extremely small cut sky 5*1/2. If we are to perform such a statis- 
tical study of 5i/2 we could enforce this structure, that is, only 
choose skies that have somewhat low full-sky and very low cut- 
sky 51/2 values. Alternatively we could choose from an ensemble 
based on the best-fitting ACDM model. In the latter case it has al- 
ready been shown that the ILC map is a rare realisation, unlikely 
at the 99.975% level |Copi et al.|2009[ l. The assumptions made in 
any study will determine the statistical questions that can be asked. 
Conversely, the statistical questions asked will implicitly contain 
the assumptions imposed. 

In Table [T| we show the 5i/2 for the ILC map calculated 
from l |22| l under various assumptions. Note that these values all 
contain the bias discussed in Sec. 12.51 as is standard in the litera- 
ture. Shown in the table are the values calculated for the full sky 
and for the partial sky where the KQ75y7 mask is employed to 
cut out the Galactic region. The cut-sky results are calculated us- 
ing the pixel based estimator of SpICE and the optimal, unbiased 
Ci estimator l |12| l from reconstructed maps at NsiDE = 128 and 
NsiDE = 16. Further, the results are shown for different map pro- 
cessing, including no processing (the unsmoothed entry where the 
map has only been degraded as required for the reconstruction), 
employing a 10° Gaussian smoothing, and filling the Galactic re- 
gion with a realisation that has the same power in each ^-mode as 
the region outside the mask but with the phases randomised. 

The results in Table[T]are consistent with what we have found 
for the Cf reconstructions. For the unsmoothed map the full-sky 
and pixel based estimators calculated at Nside = 128 show the 
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Figure 13. The 95 and 5 percentile lines for the a^rn reconstructed from the 
pixels outside the KQ75y7 mask at NSIDE = 128 (and thus ^max = 514) 
of ACDM realisations with C2 = 100 (rK)^ and the masked region filled 
in with the ILC map prior to smoothing and rescaling, as discussed in the 
text. The red, solid lines are for the real part of the and the blue, dashed 
lines are for the imaginary part. The black, solid line shows the expected 
result for a perfect reconstruction. We clearly see the reconstructed a-^m 
are not unbiased. Now the bias is most prominent for 033, the octopole 
mode with all its extrema within the Galactic plane. This figure should be 
compared to Fig. [5] 



usual result, the large discrepancy between the full and cut-sky 
values. This holds true for the smoothed map also. Further, the 
reconstructed values show the large discrepancy between the un- 
smoothed and smoothed maps (see, for example, Figs.[2]and[4|. We 
also see that the reconstructed values are systematically larger than 
the pixel based estimator showing that the reconstruction is more 
sensitive to leakage for information from inside the masked region. 
Finally the last line of the table shows the expected behaviour for 
a map where the full sky has power consistent with that from the 
cut-sky. Notice that of the cut-sky pixel-based results are consis- 
tent with each other since information leakage is unimportant. The 
small difference between the NsiDE = 128 and NsiDE = 16 re- 
constructions shows the residual sensitivity on resolution. 



3.8 Higher Multipoles 

In this work we have focused on how data handling affects the re- 
construction of the quadrupole. The quadrupole serves as an exam- 
ple of the general behaviour. As show in Figs.[T3]and[T4]we see the 
same results for I = ?> and 1 = 4:. These figures were generated 
from the same realisations employed in making Fig. |5] Again we 
see that the reconstruction is biased toward the values from the ILC 
map. 




True fl4,» (nK) 
Figure 14. The same as Fig. |13| now for £ = 4. 

4 CONCLUSIONS 

It has been argued that the large-angle CMB can be reliably recon- 
structed from partial-sky data and that when this is done the lack of 
large-angle correlation is not significantly deviant from the expec- 
tation ( [Efstathiou et al.|2010^ . At first glance the argument appears 
sound. The large-angle modes extend over large fractions of the 
sky, thus knowing their values on one region of the sky allows us 
to extrapolate them into the masked regions. However, in practise 
and under close scrutiny this argument fails. Implicit assumptions 
built in to the reconstruction process enforce agreement between 
the reconstruction and the previously constructed full sky (the ILC 
map in this case) through mixing of information from inside the 
masked region to that outside. Due to this the reconstruction has no 
value independent of the original full-sky map. It neither confirms 
nor denies the validity of that map. 

To study the large-angle CMB a choice must be made on what 
data to take as a fair representation of the CMB sky One choice 
is to accept a cleaned, full-sky map, such as the ILC map pro- 
duced by WMAP, to accurately represent the primordial CMB sky. 
In this case the full-sky map may be analysed with no reconstruc- 
tion required. In |Copi et aL| j2009[ l and in this work, however, we 
have taken the region outside the Galaxy as defined by the WMAP 
KQ75y7 mask to be a fair representation. We have shown that the 
large-angle CMB can be reconstructed using unbiased estimators 
for the af,„ and d, however the standard approach requires pro- 
cessing the original map by degrading and smoothing it. Unfortu- 
nately it is precisely the smoothing process that mixes the region 
we have taken as a fair representation of the CMB with the region 
we are trying to exclude. When the excluded region has the same 
statistical properties as the region we are including then no biases 
are introduced. On the other hand, when, as is the case with the 
ILC map, the properties are significantly different the reconstruc- 
tion is biased to agree with the full map. This is not surprising. 
Through this process one is trusting the full-sky map, mixing infor- 
mation from it into the rest of the sky, then reconstructing it. This 
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is a circular process and is unnecessary. If tlie full-sky map is al- 
ready trusted then there is no point in performing a reconstruction 
to produce a poorer version of the original map. 

The important point is that even in principle reconstructing 
following the standard approach leads to biased results unless the 
full-sky CMB is already known. We have shown for noise free, 
pure CMB maps that smoothing mixes information and biases the 
results. When applied to real data the problems only get worse. En- 
couragingly we also found that in principle reconstructing without 
smoothing leads to unbiased results. Unfortunately, directly apply- 
ing this to real data with noise and residual, unmasked foregrounds 
yields highly biased reconstructions requiring further care to apply 
this method successfully to real- world CMB. 

Overall the question of how to perform an unbiased recon- 
struction of the full large angle CMB sky remains an interesting 
one. Previous work ( Bielewicz et al. |2004[|Naselsky et al.|2008{|Liu| 
|& Li 2009; Aurich & Lustig 2010) has shown that contamination 
significantly affects the reconstruction of the large angle multipole 
moments. [Aurich & Lustig| ( |2010^ studied the case most similar to 
that considered in this work. They showed that smoothing of full 
sky map leaks information from the pixels not used in the recon- 
struction (those in a mask) to the pixels that will be used. In this 
work we have extended their result and shown how a reconstruc- 
tion such as that performed by pfstathiou et al!]j2010^ is biased due 
to this leakage of information. This shows the fundamental problem 
with trying to reconstruct the full sky from a partial sky. 

Fortunately large-angle CMB studies are not dependent on re- 
constructed full-sky maps. The partial sky when used consistently 
(see |Copi et al.|[2009[ for example) has been shown to be a ro- 
bust representation of the large scale CMB by [Aurich & Lustig] 
( |2010j l and in this work. Despite the fact that such an approach 
is suboptimal in the sense that the inferred Ct do not have the 
smallest possible variance, it is far less biased than the 'optimal' 
Ci inferred through the maximum-likelihood reconstruction. More 
robust statements about the large-angle CMB behaviour may there- 
fore be made with the partial sky pixel-based Ci. 

We conclude that the lack of large-angle correlation, particu- 
larly on the region of the sky outside the Galaxy, remains a matter 
of serious concern. 
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APPENDIX A: RECONSTRUCTING AT HIGH 
RESOLUTION 

Computationally the time and memory intensive step in recon- 
structing airn and Ce from our estimators ijTj and \i2) is the inver- 
sion of the covariance matrix, C. Fortunately this step only needs to 
be performed once for each choice of resolution, NsiDE, and mask. 

The covariance matrix is of size Ai'pix x A^pix where the num- 
ber of pixels is given by 12(Nside)^ and the size of C scales 
as (Nside)^. An increase in resolution by one step, NsiDE — >■ 
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2NSIDE, increases the size of C by a factor of 16. Working with 
cut skies does not appreciably reduce this, even the largest mask, 
KQ75y7, only cuts out 25-30 percent of the pixels. Resolutions of 
NsiDE = 32 or perhaps even NsiDE = 64 are attainable on a desk- 
top computer. Fortunately we never need to store the full and 
can calculate the elements of C as required instead of storing them. 

In our estimators all the matrices that we encounter, except 
for C^^, are of size Ni x A^pix or smaller. Here Ni — (Brecon + 
1)'^. Even for NsiDE = 512 and Brecon ~ 10 these matrices only 
require about 3 GB of storage at double precision. Further we see 
that only the matrix 

M = C~'Y (Al) 

is ever required (see Eqs.|7]and[8]l. 

To compute M we note that it satisfies the set of linear equa- 
tions 

CM = CC 'Y = Y. (A2) 

Solving such a set of equations is a standard problem in com- 
putational linear algebra. A covariance matrix is symmetric and 
positive-definite so it may be factored with a Cholesky decomposi- 
tion ([^sseraLlT992| 

C = LL"', (A3) 

where L is a lower triangular matrix. Our problem then becomes 
solving 

L(L"^M) = Lz = Y. (A4) 

This can be solved in two steps using backward substitution on 
Lz = Y to find Z followed by forward substitution on L^M = Z to 
findM. 

At this point we are left with computing L. Approximately half 
of this matrix is zero so only half of it needs to be stored (of course 
the same is true of C since it is symmetric). Unfortunately this can- 
not be further reduced and this provides the limiting factor in de- 
termining the resolution at which we can work. For NsiDE = 128 
and Brecon ~ 10 the matrix L is approximately 70 GB in size. Im- 
proving resolution to NsiDE — 256 increases the required storage 
to over 1 TB. This is what has limited our work to NsiDE = 128. 
Straight forward, numerically stable algorithms exist for calculat- 
ing L (see [Press et a l."1992 for example). Though this is a time 
consuming step once M is calculated the rest follows quickly. 
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